README 

publication titled: Correlated impulses: Using Facebook interests to improve predictions of crime rates in urban areas

The dataset contains data collected from 3 sources (described below). This data is collected at the ZIP code level for 9 highly populated US cities (Baltimore, Boston, Chicago, Dallas, Los Angeles, New York, Philadelphia, San Francisco, Washington).



(i) Facebook Advertising data:
This data was collected from Facebook's marketing API which is the platform used by advertisers for the purposes of targeted advertising on Facebook. It provides data on estimates of Monthly Active Users (MAU) on Facebook matching specified targeting criteria. For this study, data was collected on estimates of users with variuos media related interests and for various combinations of age and gender and relationship statuses; this data was collected at the ZIP code level for the 9 cities listed above.

For example:

# estimates of Monthly active users
* The value 830 in the 1st row and 4th column (with header Action_games_A_18-24) indicates that when the data was collected there were 830 Monthly Active Users (MAU) on Facebook aged 18-24 who were interested in "action games" in zip code 2021 (in Boston).
* The values in the 569th column (with header FB_user_A_18-24_I) indicate the number of Monthly Active Facebook users at the time of data collection who were aged 18-24 and whose relationship status was "in a relationship".
* The values in the 708th column (with header Online_Dating_Service_M_35p_M) indicate the number of monthly active Facebook users who were interested in Online dating services at the time of data collection and who were Male, aged 35 plus with a "married" relationship status.

# female to male ratios and fractions of users
* The values in column 1280 (with header FB_user_18-24_D_ratio) indicate the ratio of Monthly Active female Facebook users, aged 18-24 whose relationship status is "dating" to the male number of users in the same age and relationship status.
* The values in column 1307 (with header Online_Dating_Service_F_18-34_I_frac) indicate the number of Monthly Active Facebook users interested in "Online Dating Services" who were female aged 18-34 and "in a relationship" at the time of data collection, as a fraction of the overall number of Monthly Active Facebook users who were female 18-34 and "in a relationship"


The following codes, where present, are used to encode gender, age and relationship status for the Facebook variables:

* Gender: A = All genders, F = Female, M = Male,
* Age: 18-24, 18-34, 18p (18 plus), 35p (35 plus)
* Relationship status: S = Single, D = Dating, M = Married, U = Unspecified, I = In a relationship, O = In an open relationship

The Facebook data were collected from Facebook's marketing API using the following python based library: https://github.com/maraujo/pySocialWatcher; All data was collected over the period October-November 2017.




(ii) Demographic measures:
The dataset contains variables compiled from the 2015 American Community Survey (ACS). These variables provide information pertaining to age composition, income, educational attainment levels for zip codes in the dataset. 

Source:  US Census Bureau. 2012-2016 American Community Survey 5-Year Estimates;
2015. Available from: https://factfinder.census.gov/faces/nav/jsf/pages/searchresults.xhtml?refresh=t





(iii) Data on crime events and crime rates:
The dataset also contains data on total number of reported incidents in 2017 for three types of crimes (namely assaults, burglaries and robberies) for ZIP codes in the dataset. This data was collected from open data portals of individual cities in the dataset. 

Crime rates are computed as the total reported incidents of that crime divided by the ZIP code's population; this is multiplied by 100,000 to give crime rates per 100,000 people. 
